CUDA 程式設計指南：硬體與軟體的對映：計算能力版本

計算能力（CC）扮演著虛擬架構與實際架構之間的版本橋接角色 虛擬架構 （PTX）與 實際架構 （SASS/二進位）之間。開發者使用 nvcc 來針對特定平台，範圍從桌面/伺服器平台到嵌入式平台，支援如 Linux 64 位元（LP64） 或 Windows 64 位元（LLP64）。

1. 虛擬架構與實際架構

CUDA 工具包支援最近兩個主要版本的 GPU 架構，相關資訊可參見 表 29：不同計算能力版本的特性支援（7.5 至 12.x）。我們透過旗標來定義對映關係，例如： nvcc --generate-code arch=compute_80,code=sm_90 prog.cu。針對未來目標，可使用如 nvcc -arch=sm_100 或專用變體如 nvcc -arch=sm_100a 等旗標。

2. 宏層級結構

編譯器使用 __CUDA_ARCH__ 來進行程式碼分支。 宏 __CUDA_ARCH__ 僅在裝置程式碼中被定義 （例如， __device__、 __global__）。更細微的控制則由 __CUDA_ARCH_SPECIFIC__ 以及 __CUDA_ARCH_FAMILY_SPECIFIC__提供。某些功能，例如 分散式共享記憶體 或特定的 NaN 資料載荷，需要 計算能力 9.0 或以上 或 計算能力 10.0 及以後版本。

3. 數值限制與約束

精確度依計算能力而異；例如，次正常數處理確保 $2^{-16382} \approx 3.36 \cdot 10^{-4932}$。硬體限制如 CUDA_DEVICE_MAX_COPY_CONNECTIONS=16 或 .maxnreg PTX 指令 會根據目標計算能力版本嚴格執行。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Where is the __CUDA_ARCH__ macro strictly defined?

In both host and device code to identify the GPU hardware.

Only in device code (__device__, __host__ __device__, or __global__).

Only in the host-side main() function.

In the NVML API headers only.

QUESTION 2

Which command correctly targets a virtual arch of 8.0 and a real arch of 9.0?

nvcc -arch=sm_80,code=compute_90

nvcc --generate-code arch=compute_80,code=sm_90

nvcc --target=cc_8.0_9.0

nvcc -arch=sm_90

QUESTION 3

What is the consequence of declaring 'namespace cuda { struct foo; }' in CUDA code?

It enables high-speed memory access.

It is an error; the 'cuda' namespace is reserved.

It is required for using cuda::std::result_of.

It allows the use of __nv_atomic_load.

QUESTION 4

Which mathematical property is associated with CC 9.x and higher?

Support for 16-byte data types.

Basic support for fabs(x) and sin(x).

Introduction of the warpSize macro.

Ability to use x + y in kernels.

QUESTION 5

What happens when an extended lambda is defined inside a generic lambda using Microsoft Visual Studio host compilers?

It compiles successfully and inlines perfectly.

The host compiler may fail to inline or throw an error.

It enables __managed__ memory by default.

It triggers the on-disk JIT compilation cache.